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Description 

[0001] The present invention relates to image processing, and more particularly relates to techniques providing text/ 
image selection from document images. 
5 [0002] Conventional word processor applications for the personal computer enable a user to select text or image 
portions within a document, corresponding to an electronically stored file, by means of button presses and dragging 
of a mouse cursor. 

[0003] Such features are also known for document processing systems as described in WO 9605570A1 . Such doc- 
ument processing systems are used for archiving purposes and comprise a user interface, a memory for storing bitmap 

10 data representing a document that includes text. The user interface includes a display for visualizing an image of the 
bitmap data and an input device such as a mouse for specifying locations within the displayed image corresponding 
to locations within the stored bitmap data. A document processing system further includes a bitmap data processor 
that is responsive to the first specified location designating a start of an area of the image containing text to a second 
specified location designating a termination of the area of the image containing text, for processing bitmap data cor- 

15 responding to the area. The bitmap processor operates to determine a lateral extent of lines of text within the area, to 
determine an amount of slope if any of the lines of text within the area, to determine a center-to-center spacing of the 
lines of text within the area, and to determine a location of a top line of the text. That is, the bitmap processor operates 
to refine the boundary of the area specified by the input device so as to provide a geometric specification of all text 
appearing within a bitmap data that corresponds to the originally specified area. The bitmap processor preferably 

20 operates to first laterally compress the bitmap data prior to operating on the same. 

[0004] The situation is quite different when the displayed document is that captured by a document camera providing 
greyscale. and usually relatively low resolution, images, such as those employed in over-the-desk scanning systems. 
It is known to use in such over-the-desk scanning systems a video camera disposed above a desk and capturing 
images of documents which are displayed to a user on a CRT monitor or other display device: these are discussed in 

25 detail, for example, in EP-A-622,722 (applicants' reference R/93003K/JDR) and British patent application 9614694.9 
(applicants' reference R/96007/JDR), published as EP-A-840 199 (6 may 1998). The capture of the document images 
may be for display in situ, or for transmission to a remote location as part of a videoconferencing tool. 
[0005] However, a problem encountered with such systems is how to provide a very efficient text selection interface 
for interactive face-up document camera scanning applications. There is a need for techniques supporting the selection 

30 of rectangular regions of text and images within a captured by the camera via a "click-and-drag" of the mouse defining 
two points, or a leading diagonal, and for techniques providing, in much the same way as a word processor interface, 
for single and multi-word text selection from such a document. 

[0006] It is, therefore, an object of the present invention to provide an image processing system which can be used 
as a communication tool, for example, for display in situ, or as part of a video conferencing tool, and which provides 
35 a text selection interface in much the same way as a word processor interface for a single and multi-word text selection 
from such a document. 

[0007] The present invention provides a method as recited in claim 1 . 

[0008] The method preferably further comprises the step of: (h) extracting the image from within the selection ele- 
ment. 

40 [0009] The method preferably further comprises the step of: (I) rotating the extracted image through the determined 
skew angle (6), in the opposite sense. 

[0010] The invention further provides a computer program product directly loadable into the internal memory of a 
digital computer comprising software code portions for performing the steps of the aforementioned method. 
[0011] The invention further provides a programmable image processing system when suitably programmed for 
45 carrying out the method of any of the appended claims or according to any of the particular embodiments described 
herein, the system including a processor, and a memory, an image capture device, an image display device and a user 
input device, the processor being coupled to the memory, image capture device, image display device and user input 
device, and being operable in conjunction therewith for executing instructions corresponding to the steps of said method 
(s). 

so [0012] In the case where the user employs a "click-and-drag" of the mouse defining two points, the remaining degree 
of freedom in the selection of a rectangle in the image is the skew of the text. The invention employs skew angle 
detection techniques to this document camera case where the location of "skew-pertinent" information is supplied by 
the user with the mouse, having extracted an intermediate image from the underlying greyscale image. The method 
is fast enough to find skew within less than 0.5s for most font sizes, which is fast enough to provide a pleasing interface. 

55 a similar effect is obtained for single-word and multi-word selection techniques. 

[0013] Embodiments of the invention will now be described, by way of example, with reference to the accompanying 
drawings, in which: 
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Figure 1 is view from above a desk of a document from which a text portion is to be selected in an over-the-desk 
scanning system according to an embodiment of the present invention; 
Figure 2 shows the same view as in Fig. 1 , after a user has finished selecting the text portion; 
Figure 3 is a flow chart of the steps in providing text selection in accordance with an embodiment of the present 
5 invention; 

Figure 4 shows the substeps employed in implementing the skew detection step in Fig. 3; 
Figure 5 shows the effect of the substep in Fig. 4 of computing a high gradient image; 

Figure 6 illustrates the effect of varying the size of the test image portion on the effect of the skew detection substep 
in Fig. 4; 

10 Figure 7(a) shows a portion of captured and displayed text from which a user makes a selection, and Figure 7(b) 

shows in magnified form ; part of the text matter of Fig. 7(a), showing a selected word; 

Figure 8 is a flow chart showing the processing steps performed in providing the selection feedback illustrated in 
Fig. 7; 

Figure 9 shows in more detail the technique (step s14 in Fig. 8) for determining the local inter-word threshold; 
15 Figure 10 shows histogram is formed of horizontal gaps between the connected components of equal font size, 

(a) the ideal bimodal distribution, (b) real data with two attempted curve fittings, and (c) the fitting of a Gaussian 
curve; 

Figure 1 1 illustrates text selection by diagonal sweep in an alternative embodiment of the invention; and 
Figure 12 is a flow chart of the steps performed in providing the selection feedback shown in Fig. 11 . 

20 

[001 4] There are described below various techniques for text and/or image selection . It will be appreciated that these 
techniques may be used in conjunction with the image enhancement and thresholding techniques described in Euro- 
pean patent application EP-A- , based on British patent application 9711024.1 (applicants' ref: R/ 

97008/JDR), filed concurrently herewith. 

25 

A. System configuration 

[0015] It will be appreciated that the techniques according to the invention may be employed in any system or ap- 
plication where selection of a text portion from a multibit-per-pixel (e.g. greyscale or colour) image is required. Such 

30 instances include videoconferencing systems, scanning systems, especially the aforementioned over-the-desk scan- 
ning systems, multifunction devices, and the like. It will be appreciated that the invention may be implemented using 
a PC running Windows™, a Mac running MacOS, or a minicomputer running UNIX, which are well known in the art. 
For example, the PC hardware configuration is discussed in detail in The Art of Electronics, 2nd Edn, Ch. 10 ; P. Horowitz 
and W. Hill, Cambridge University Press, 1 989. In the case of over-the-desk scanning, the invention may form part of 

35 the systems described in any of EP-A-495,622, EP-A-622,722, or EP-A-840 199, based on British patent application 
9614694.9 (applicants' reference R/96007/JDR) filed 12.7.96. The invention has been implemented in C++ on an IBM 
compatible PC running Windows® NT. 

B. Rectangular text region selection via skew detection 

40 

[0016] This section describes a text selection technique that enables rectangular text region selection. The user 
defines a leading diagonal of the rectangle with a mouse. Automatic text skew detection is used to calculate the required 
image selection. Skew recovery is made efficient by analysing the image in the neighbourhood of the mouse input. 
[0017] Figure 1 is view from above a desk of a document from which a text portion is to be selected in an over-the- 

45 desk scanning system incorporating an embodiment of the present invention. 

[0018] Initially, a document 2 is open on the user's desk (not shown), and the user has positioned the document 2 
so that the paragraph 4 which he wishes to scan/copy is within the field of view 6 of the camera (not shown). Images 
(greyscale) of the document 2 are captured and displayed to the user as feedback. As discussed in the EP-A-840 1 99 
(R/96007/JDR), the content of the field 6 may be displayed (as live video images) within a window of any suitable 

50 display device, such as a CRT or LCD display. Using a conventional mouse, the user is able to control the cursor 
position in a familiar way; and the start of the selection of the paragraph 4 begins with the user pressing the left mouse 
button with the cursor at initial position 8. While the left mouse button remains pressed, the user makes a generally 
diagonal line (top left to bottom right): an intermediate cursor position 8' during this motion is shown. 
[0019] Figure 2 shows the same view as in Fig. 1 , after a user has finished selecting the text portion: end of selection 

55 by the user is inputted by the user releasing the left mouse button when the cursor is at the final cursor position 8". As 
can be seen, the text of document 2 is skewed with respect to the coordinate space of the camera's field of view 6: the 
angle of skew 6 must be determined. 

[0020] Figure 3 is a flow chart of the steps in providing text selection in accordance with an embodiment of the present 
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invention. Initially, the start of selection user input is detected (step s1). Immediately (step s2), the image (i.e. within 
the field 6) displayed to the user is from on the display device (not shown). Next, a routine (s3) is performed to determine 
the skew angle 6, as is described in further detail below. Returning to Fig. 2, once the value ofe is obtained, the positions 
within the coordinate space of the display window of a selection rectangle 1 0 which is to be displayed as feedback to 
5 the user must be determined; the requirement being that, to provide a pleasing interface for the user, the selection 
rectangle 1 0 must be at the same skew angleB. The coordinates ((x, y), (x 1 , y')) corresponding to the initial and current, 
respectively, cursor positions 8, 8" are known. Using simple geometric relations, the coordinates (a, b) and (c, d) of 
the other comers of the rectangle 1 0 can readily be calculated. The skew angleB is normally a small angle: generally 
it will be less than 5°. 

10 [0021] As shown in Fig. 3, a rectangle is formed (step s5) with (x, y), (x 1 , y'), (a, b) and (c, d) at the corners. This 
rectangle is then superimposed (step s6) on the stored frozen image data, and the resulting image displayed. At test 
is then made at step s7: if the user has finished selecting (i.e. an input received indicating that he has released the left 
mouse button), and if he has not, processing returns to step s4. (For illustration, the final cursor position 8" is used as 
the 'current' cursor position, although it will be appreciated that this process may be carried out continually during the 

15 user's diagonal movement of the cursor.) 

[0022] If the user has finished selecting, the current image is frozen (step s8) in the display (window) Then, the image 
data for the image (here: the paragraph 4) present within the selection rectangle 1 0 is extracted (step s9) from that for 
the image within the field 6, and the extracted image is then rotated (step s1 0) through -6, so as to ready it for further 
processing, such as OCR 

20 [0023] Figure 4 shows the substeps employed in implementing the skew detection step in Fig. 3. This routine is 
based on the techniques described in US-A-5, 187,753 and US-A-5,355,420 — maximising the variance of the laterally 
projected profile of differences over a range of skew angles, where the rotation of the image is made efficient by only 
performing vertical shears. The process begins (step s31) by initialising a grab area 1 2 (a small rectangular area to 
the right and below the cursor position such as that shown in Fig. 6 (discussed further below)). Suitably, the grab area 

25 1 2 is just large enough for a few lines of text, and perhaps a couple of lines of 1 0 point text. 

[0024] In order to minimise the amount of time taken to compute skew, we attempt to analyse the smallest amount 
of the image as we can. The algorithm is capable of resolving skew with less than two lines (it has been found) but the 
problem is clearly that it is not known how large the font size is before the skew angle has been determined. 
[0025] To this end, an initial sample size (grab area 1 2) that is large enough to capture several lines of text at a "most 

30 likely" font size of between 1 0-1 2pt is used. Further, this initial region is to the right and below the initial "click", which 
assumes that the mouse is being dragged from top-left to bottom-right and that the skew angle is not too great (typically 
±5 degrees). Because this is the most common font size that is selected using the interface, this gives an overall 
optimum response time. 

[0026] The next step (s32) involves the computation of a high gradient image from the image within the initial grab 
35 area 12. The images of the document in Fig. 1 are greyscale images. An option is to threshold the image and then 
pass it to a skew detection algorithm. However, under uncontrolled lighting conditions, thresholding is potentially quite 
computationally expensive. 

[0027] Figure 5 shows the effect of the substep in Fig. 4 of computing the high gradient image, which is accomplished 
using the familiar Sobel operator (discussed in more detail in Jahne B., Digital Image Processing, section 6.3.2, Spring- 

40 er-Verlag, 1991). In the resultant high gradient image of Fig. 5, each white pixel is the result of the gradient in the 
original (greyscale) image at that point being greater than a predetermined threshold, and each black pixel is the result 
of the gradient in the original (greyscale) image at that point being less than the predetermined threshold. The high- 
gradient image (used as the input for the skew detection) is easily computed from the greyscale supplied by the doc- 
ument camera and is a very reliable substitute for a fully thresholded image. 

45 [0028] Computations are next performed on the image data for the high gradient image for each of the allowed set 
of skew angles (e.g. +5° to -5° in increments of 0.1°; although any suitable regime may be employed — see US-A- 
5,187,753 and US-A-5,355,420): step s33. In each case, the image is sheared (step s34) to approximate the rotation. 
Here, a technique used in the vertical shearing procedure that lies at the heart of the angular search is to wrap around 
the vertical shift. In other words, the pixels that are pushed out of the top of the region are re-inserted at the bottom in 

50 the same column. This way the variance profiles are always calculated on a rectangular region which makes everything 
neater, if not more reliable. 

[0029] For the given angle, a lateral projection histogram for the image is computed (step s35). Based on the histo- 
gram data, the variance for the given angle is calculated (step s36). A plot of variance against angle (of rotation) may 
thus be plotted, as shown in Fig. 6(a). The ability of the technique to determine the skew angle depends on the size 
55 of the initial grab area 1 2 relative to the font size; and the absence of a discernible peak in Fig. 6(a) indicates that the 
computation has been unsuccessful. A test is made at step s37 to determine whether the highest peak in the plot of 
skew var vs angle is significant (a simple SNR based test), such as by determining whether the ratio of the peak value 
to the average value is greater than a predetermined value. If the peak is not significant, the size of the initial grab area 
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12 is increased (step s38), and the processing returns to step s32. The grab area 12 is expanded in the vertical direction 
more than the horizontal as it is in that direction that the most skew-pertinent information lies. This is done until an 
empirically defined threshold on the SNR (in this case defined to be the maximum variance divided by the mean var- 
iance) is reached. 

5 [0030] Figures 6(b) and (c) illustrate the effect of varying the size of the grab area 12 on the effect of the skew 
detection substep in Fig. 4, in the case where the font size is 36pt. Clearly, a significant peak is ascertained for Fig. 6 
(b), from which a skew angle of 0.35° can be derived. This shows that very little text is needed for a good skew confi- 
dence: the grab area 12 of Fig. 6(b) is sufficient for the determination, and there is no need to expand to the larger 
area of Fig. 6(c). In a preferred embodiment, the first grab area 1 2 is 1 00x1 00 pixels, the next largest is 200x200 pixels, 

10 and the next largest 300x300 pixels. If the latter fails, a value of 9=0 is returned. 

[0031] The above description outlines the situation where text matter (paragraph 4) is sought to be selected by the 
user. However, the techniques according to the invention may be used for the selection of graphical objects within a 
document, and the aforementioned techniques have also been found to work well with graphics and line drawings. 
[0032] The algorithm described in this section is very efficient, and the delay between starting to drag out the leading 

15 diagonal and the skew being detected is of the order of 0.5s on standard PC hardware, and slightly longer for larger 
and less common font sizes. 

[0033] In addition, it will be appreciated that provision may be made, suitably using techniques for the resizing and 
moving of windows in the MS Windows environment, allowing the userto resize and/or reposition the selection rectangle 
10 after it has been formed by releasing the left mouse button. 

20 

C. Single and multi-word selection methods 

[0034] This section describes a single and multi-word text selection process, the aim being to imitate a common 
word processor interface, i.e. double click selects a word and "click-and-drag" may define a non-rectangular text region. 
25 [0035] Figure 7(a) shows a portion 13 of captured and displayed text from which a user makes a selection. The 
selection may be of a single word, or of multiple consecutive words. 

[0036] In this case, the user selects, using a cursor controlled by a mouse (not shown) in the conventional manner, 
from a portion 13 of displayed text matter a word 14 ("compression"): the user performs a "double-click" with the left 
mouse button with the mouse cursor in an initial position 8. As is shown (slightly exaggerated forthe sake of illustration), 
30 the text matter is skewed by an angle G with respect to the display coordinate space. Appropriate feedback must be 
displayed to the user overlaid on the word 1 4, to show that it has been selected. 

[0037] Figure 7(b) shows in magnified form, part of the text matter of Fig. 7(a), showing a selected word. To indicate 
selection, a selection block 20 is displayed overlaid on the word 14. (Here the block 20 is shown using hatching forthe 
sake of illustration, but generally will comprise a solid black or coloured block, with the characters of the word 14 
35 appearing as white or "reversed out".) The selection block 20 has vertical sides 22, 24 and horizontal sides 26, 28. 
The sides 22, 24 are positioned midway between the selected word 14 and the two adjacent words in the line — "analog" 
and "curve" respectively — and for this computations based on measured values of the inter-character separation (s c ) 
and the inter-word spacing (s w ) must be made, as described further below. 

[0038] In addition, the sides 26, 28 are positioned midway between the line containing the selected word 1 4 and the 
40 line of text above and below it, respectively. The sides 26, 28 are also skewed bye with respect to the horizontal 
dimension of the display, thereby providing appropriately-oriented selection feedback (block 20). 
[0039] Figure 8 is a flow chart showing the processing steps performed in providing the selection feedback illustrated 
in Fig. 7. Initially (step s11), a user's double click of the left mouse button (i.e. first and second user inputs in rapid 
succession) is detected; and the displayed image is immediately frozen (step s12; although it will be appreciated that 
45 the freezing will occur upon the first of the "clicks" being made). 

[0040] An operation is then performed (step s13), using a small region near (typically below and to the right of) the 
initial cursor position 8 (at the first mouse click), to determine the angle 6 at which the text is skewed: this is described 
in detail above, with reference to Fig. 4. Then, a routine (s14) is performed to generate, for the small local region, an 
estimate of the inter-word spacing (threshold) (s w ) min (corresponding to (s c ) max ) — the threshold spacing above which 
50 the spacing must be an inter-word spacing rather than an inter-character spacing. A determination is then made (step 
s15), using known techniques, of the line separation within the small local region: this is the separation between the 
maximum height of characters on one line and the lowest level for characters on the line above it; and this enable the 
positions of the sides 26, 28 of the selection block 20 (Fig. 7(b)) to be determined. 

[0041] In step s16 this determination is made, together with a calculation of the positions of the sides 22, 24 of the 
55 selection block 20: side 22 is 1 /2(s w ) min to the left of the character "c" in the selected word 1 4 ("compression"), and side 
24 is 1 /2(s w ) min to the right of the "n" in the selected word 14. The selection block 20 with these sides is formed in step 
s1 7, and then in step s1 7 the selection block 20 is overlaid on the frozen image and the result displayed. 
[0042] If the user has finished selecting, the current image is frozen (step s8) in the display (window). Then, the 
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image data for the image (here: the word 14) present within the selection block 20 is extracted (step s19) from that for 
the image, and the extracted image is then rotated (step s20) through-G, so as to ready it for further processing, such 
as OCR. Figure 9 shows in more detail the technique (step s14 in Fig. 8) for determining the local inter-word threshold 
(s w ) min . Initially (step s1 41), for the local region and using techniques known in the art for computing connected com- 

5 ponents, the character-character separations are measured for each pair of adjacent characters. Here, the previously 
obtained skew information (6) is used to make the O'Gorman Docstrum techniques (Lawrence O'Gorman, "The Doc- 
ument Spectrum for Page Layout Analysis", in IEEE Transactions On PAMI, Vol 15, No. 11 , Nov1993) run faster. O'Gor- 
man used a connected component nearest neighbours method to find skew and inter-character and inter-line spacing. 
We use the skew information to find nearest neighbours in the line to give us inter-character information, and connected 

10 component heights to group blocks of consistent font size. 

[0043] A histogram is formed of horizontal gaps between the connected components of equal font size. This is ideally 
is a bimodal distribution (see Fig. 10(a)), i.e. with a first peak (mode) 36 corresponding to inter-character spacings (s c ), 
and a second peak (mode) 38 corresponding to inter-word spacings (s w ). Figure 10(b) shows a plot 40 of the real 
measured value for a typical sample, together with two curves sought to be fitted to the plot of the real data curve 40 

15 — a curve 42 for a cubic fitting and a curve 44 for a quartic fitting. The curves 42, 44 intersect with the separation 
(distance) axis at I 1 and I", respectively. 

[0044] The intersection (I 1 ) for the best fitting curve 42 may be used as the value of (s w ) min . However, in the preferred 
embodiment, an attempt is made to make a best fit of a Gaussian curve to the first mode 36 in Fig. 1 0(a), and this is 
shown in Fig. 10(c). Obtaining the best fitting of the Gaussian curve 46 imputes an "intersection" I on the separation 
20 axis: this value is used as (s w ) min and must be determined. 

[0045] Returning to Fig. 9, step s1 41 is followed by steps for finding the best fit, using 2k values about the value i=m, 
where m is the value at which the curve 46 has its peak. First, k is set to zero (step s142). In step s143, kis incremented 
by one, and then values for ^ k (the average of (2k-i-1 ) values around the mode) and o k 2 (the variance of (2k-i-1 ) values 
around the mode) are computed (step s144) according to the equations: 

25 



30 



^1/(2k+1). 
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[0046] The proportion of data in (2k+1) values around the mode is given by: 

40 

m+k N 

Pk = {I h(i)}/ (L h(i)} 00 

i=m-k 1=0 

45 

[0047] At step s145, an analysis is made of whether the curve is a good fit: this is done using the well known Chi- 
squared test. If the test is failed, processing returns to step s143. This is continued until the test of step s145 is passed. 
Upon the test being passed the value of (s w ) min has been found and is equal to H k +3a k . 

[0048] As can be seen, reliable determination of inter-character spacing enables us to segment out single words 
50 with a double click. Click-and-drag selection of multiple words needs further knowledge of the column limits. This is 
done using a technique due to Pavlidis (Pavlidis, T. and Zhou, J., "Page Segmentation and Classification," CVGIP, 
Graphical Models and Image Processing, Vol 54, No 6, Nov. 1992 pp. 484-496) based on vertical projection profiles 
at the connected component level. Again, the sample image used to establish column boundaries is grown until con- 
fidence measures are above specified levels. This interface has a slightly longer inherent delay than that of section B 
55 for finding skew alone, but with faster hardware this may become unimportant. 

[0049] Thetechnique of highlighting a word by a "double-click" has been described. It will be appreciated that, through 
simple modification of the techniques of Figs 8-1 0, and using well known techniques for finding sentence boundaries 
and the abovementioned methods of determining text (column) limits, techniques may be provided for indicating se- 



cy* 2 = 1/(2k+1). 
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lection, in response to a third mouse click, of the whole sentence containing the word selected by the double click. I. 
e. ; the second click of the user's doubleclick becomes an intermediate user input and the third click the final user input 
defining the end of selection. 

[0050] Furthermore, it will be appreciated that, through simple modification of the techniques of Figs 8-10, and using 

5 the abovementioned methods of determining text (column) limits, techniques may be provided for indicating selection, 
in response to a selection comprising click, drag and release, of the whole of the text matter between the point of the 
initial click and the release of the mouse button. I.e., there is a first user input at the first mouse click (with the left 
mouse button being held down), an infinite number of intermediate "user inputs" as the cursor is dragged across the 
text, and the final user input defining the end of selection when the left mouse button is released. This is illustrated in 

10 Fig. 11 (the column limits are omitted for the sake of clarity/illustration). 

[0051] Figure 12 is a flow chart of the steps performed in providing the selection feedback shown in Fig. 11 . The 
process begins with the user's left mouse button click to start selection. The skew angle is determined as described 
hereinabove and then the column boundaries derived using techniques based on the abovementioned work of Pavlidis. 
[0052] Next, a small vertical portion of the column is segmented. With knowledge of skew angle and column bound- 

15 aries, this step could simply be to segment the whole column to locate the position of each of the words. However, 
word segmentation tends to be a slow operation, so instead, we divide the column into small horizontal strips and 
segment each of these separately. This segmentation process operates in a separate thread of program execution, 
thus allowing the user to freely continue moving the cursor, and for the system to update the selection display. This 
leads to a relatively fast interactive display, whilst at the same time, allowing anything from a single word, to a whole 

20 column to be selected. 

[0053] As shown on Fig. 1 1 , the selection block 30 thus formed is constituted by an upper block (covering the selected 
text on one line) extending from the left side 22' to a side coincident with the column boundary and having a lower side 
32, and a lower block (covering the selected text on an adjacent line) extending from the side 24' and having an upper 
side 24. It will be appreciated that where the selection extends over 3 or more lines then the selection block also include 
25 an intermediate block, between the upper and lower ones, which extends between the column boundaries. 

Copy to Clipboard 

[0054] Once the user has selected a region, the next step is to copy it to the Windows clipboard. As previously 
30 described, this can be done in a number of different ways. The operations that are performed on the selected region 
prior to it being copied depend not only on the way in which it is copied, but also on the way in which it was selected; 
Table 1 highlights the operations necessary. 



Table 1 



35 


Operations necessary for copying a selected region 




Selection method: 
Copy as: 


Rectangular selection box 


Skewed selection box 


Word-to-word selection 


40 


Text 


Copy region 

Binarise 

OCR 


Copy region 
Rotate 
Binarise 
OCR 


Copy region 
Rotate 

Mask unwanted text 

Binarise 

OCR 


45 


Color image 


Copy region 


Copy region 
Rotate 


Copy region 
Rotate 

Mask unwanted text 


50 


Grey-scale image 


Copy region 

Convert color to grey-scale 


Copy region 
Rotate 

Convert color to grey-scale 


Copy region 
Rotate 

Mask unwanted text 
Convert color to grey-scale 


55 


Binary image 


Copy region 
Binarise 


Copy region 

Rotate 

Binarise 


Copy region 
Rotate 

Mask unwanted text 
Binarise 
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[0055] For example, if the region was selected using the skewed selection box and a color image is required, we 
first make a local copy of the selected region, de-skew by rotating it through the skew angle, and then place it on the 
clipboard. A more complex example is copying as text following a word-to-word selection. In this case, it is also nec- 
essary to mask out unwanted text from the beginning and end of the first and last lines of text. This is followed by 
5 converting the color image to a black and white image (binarisation), which is then passed to the OCR engine. Finally, 
the text returned by the OCR engine is then placed on the clipboard. 

[0056] Of all these operations, one of the most important is the binarisation stage, particularly when followed by 
OCR. Due to the low resolution of the camera images, coupled with possible lighting variations, unacceptable results 
will be obtained if the camera image isbinarised using a simple threshold algorithm. Therefore, the image enhancement 
w and thresholding techniques of British patent application 9711024.1 (ref. R/97008), published as EP-A-881 596 (2 
December 1998), are suitably used. 

[0057] It will be further appreciated that a function may be provided (selectable by the user using a button on a toolbar 
of the Ul) for when the user is invoking the 'Copy as text' function, enabling the line breaks to be removed from the 
OCRed text. This is useful, for example, when the text is to be pasted into a word processor. Furthermore, another 
15 such toolbar button may provide the option of the user viewing the selection they have copied in a preview window, in 
a manner similar to the clipboard viewer on a conventional PC. 



Claims 

20 

1 . A text selection method in an interactive face-up document scanning system for transmitting in a videoconference 
or for display in situ comprising: 

(a) capturing images successively by an image capturing device, 

25 

(b) displaying successive images captured by the image capture device, each image being defined by grey- 
scale and/or color image data and containing text matter, 

(c) receiving a first user input (S1) defining the start of a selection and a first position (8) within the displayed 
30 image (6), 

(d) in response to the first user input (S1 ), freezing (S2) the displayed image, 

(e) determining the skew angle (9) of text matter with respect to the field of view (6) of the image capture device, 

35 

(f) receiving at least one further user input, including a final user input defining the end of a selection, and 

(g) forthe or each further user input, determining, using the skew angle G determined in step (e), the position, 
shape and dimensions of a selection element in dependence upon at least said first position (8), and 

40 

(h) displaying the selection element (S6) superimposed on said frozen displayed image. 

2. The method of claim 1 , wherein step (e) comprises determining the skew angle of a first portion of text matter at 
or adjacent said first position. 

45 

3. The method of claim 1 or2, wherein said final user input defines a second position (8") within the displayed image, 
and said selection element comprises a rectangle (1 0) having two opposite corners (a, b, c, d) coincident with said 
first (8) and second (8") positions. 

50 4. The method of claim 1 or 2, wherein the selection comprises the selection of one or more words of said text matter 
within said displayed image, and said selection element comprises a selection block overlaying said one or more 
words. 

5. The method of claim 4, wherein step (g) comprises the substeps of: 

55 

(g1) determining a word separation value (s w ) min from measured values of separation between adjacent pairs 
of characters in said text matter, and 
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(g2) determining the dimensions of the selection block (20) in the direction of flow of said text matter as a 
function of the word separation value (s w ) mjn determined in step (g1). 

6. The method of claim 5, wherein step (g1 ) comprises the substeps of: 

5 

(g1 i) using a portion of said text matter, preferably at or adjacent said first position (8), forming a histogram of 
frequency versus inter-character spacing (S c ) for each pair of adjacent characters within said portion, 

(glii) using a plurality of different Gaussian curves (46), determining which curve is a best fitting curve, said 
10 best fitting curve forming a best fit with a predetermined mode of the histogram formed in step (g1 i), and 

(gliii) determining an estimate point on the inter-character spacing axis of the histogram at which the best 
fitting curve satisfies a predetermined criteria. 

15 7. The method of claim 6, wherein the estimate point corresponds to the value (s w ) min =fi k =+3o k , where: 



m+k 

Hk= 1/(2k+1). E h(i) 
i = m-k 



m+k 

a k 2 = 1/(2k+1). £<h(i)W. 

i = m-k 

30 

8. The method of any of claims 4 to 7, wherein step (g) further comprises: 

(g3) determining the line spacing (Sj) between adjacent lines of said text matter, and 

35 (g4) determining the dimensions of the selection block (20) in the direction perpendicular to the flow of said 

text matter as a function of said line spacing (s,) and/or wherein step (g) further comprises: 

(g5) determining the horizontal (column) limits of the text matter, and 

40 (g6) determining whether said first (8) and second (8") positions are on different lines of said text matter. 

9. The method of claim 8, wherein, if the determination in step (g6) is positive, step (g2) further comprises: 

(g2i) for an upper portion of the selection block, overlaying text matter between said first position and the right 
45 hand horizontal (column) limit of the text matter, and 

(g2ii) for a lower portion of the selection block, overlaying text matter between said second position and the 
left hand horizontal (column) limit of the text matter; and/or wherein step (g2) further comprises: 

50 (g2iii) where the line of the text containing said first position is separated by one or more further lines from the 

line of text containing said second position, for an internal portion of said selection block (20) overlaying said 
one or more further lines, using as the left hand and right hand sides of said internal portion the left hand and 
right hand, respectively, horizontal (column) limits of said text matter. 

55 10. A computer program product directly loadable into the internal memory of a digital computer, comprising software 
code portions for performing the steps of claim 1 when said product is running on an interactive face-up document 
scanning system for transmitting in a videoconference or for display in situ. 
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11. An interactive face-up document scanning system for transmitting in a videoconference or for display in situ in 
which images of documents are captured by an image capture device, the system including a processor, and a 
memory, an image capture device, an image display device, a user input device and a computer program of claim 
10, being loaded into the memory, the processor being coupled to the memory, image capture device, image display 
5 device and user input device executing the computer program. 

Patentanspruche 

10 1. Ein Textauswahlverfahren in einer interaktiven Oberseite-nach-oben Dokumentenabtastvorrichtung zur Ubertra- 
gung in einer Videokonferenz Oder zur an-Ort-und-Stelle Anzeige, umfassend: 

(a) nacheinander Erfassen von Bildern durch eine Bilderfassungsvorrichtung, 

15 (b) nacheinander Anzeige von Bildern, die durch die Bilderfassungsvorrichtung aerfasst wurden, wobei jedes 

Bild durch Graustufen und/oder Farbbilddaten und darin enthaltene Textbestandteile definiert ist, 

(c) Empfangen einer ersten Benutzereingabe (S1), die den Anfang und eine erste Position (8) innerhalb des 
angezeigten Bildes (6) definiert, 

20 

(d) Einfrieren (S2) des angezeigten Bildes als Reaktion auf die erste Benutzereingabe (S1), 

(e) Bestimmung des Neigungswinkels (0) der Textbestandteile hinsichtlich des Blickfeldes (6) der Bilderfas- 
sungsvorrichtung, 

25 

(f) Empfangen von mindestens einer weitern Benutzereingabe einschlieBlich einer abschlief3enden Benutzer- 
eingabe, die das Ende einer Auswahl definiert, 

(g) Bestimmung der Position, Form und GroRe eines ausgewahlten Elements in Abhangigkeit von zumindest 
30 der ersten Position (8) unter Verwendung des Neigungswinkels G, der im Schritt (e) bestimmt wurde, und 

(h) Anzeigen des ausgewahlten Elements (S6), eingeblendet auf dem eingefrorenen angezeigten Bild. 

2. Das Verfahren nach Anspruch 1 , worin der Schritt (e) die Bestimmung des Neigungswinkels eines ersten Abschnitts 
35 von Textbestandteilen bei oder angrenzend an die erste Position umfasst. 

3. Das Verfahren nach Anspruch 1 oder 2, worin die abschlieBende Benutzereingabe eine zweite Position (8") in- 
nerhalb des angezeigten Bildes definiert und das Auswahlelement ein Rechteck (1 0) mit zwei gegenuberliegenden 
Ecken (a, b, c, d) umfasst, die mit der ersten (8) und der zweiten (8") Position zusammenfallen. 

40 

4. Das Verfahren nach Anspruch 1 oder 2, worin die Auswahl die Auswahl eines oder mehrerer Worte des Textbe- 
standteils innerhalb des gezeigten Bildes umfasst, und das Auswahlelement einen Auswahlblock umfasst, der 
iiber einem odermehreren Worten liegt. 

45 5. Das Verfahren nach Anspruch 4, worin der Schritt (g) die Unterschritte umfasst: 

(g1) Bestimmung eines Worttrennwertes (s w ) min aus gemessenen Werten des Abstands zwischen angren- 
zenden Paaren von Zeichen in dem Textbestandteil, und 

50 (g2) Bestimmung der GroBe des Auswahlblocks (20) in der Flussrichtung des Textbestandteils als Funktion 

des Worttrennwertes (s w ) min , der im Schritt (g1) bestimmt wurde. 

6. Das Verfahren nach Anspruch 4, worin der Schritt (g1) die Unterschritte umfasst: 

55 (gi j) Verwendung eines Abschnitts des Textbestandteils, vorzugsweise bei oder angrenzend an die erste Po- 

sition (8), Erzeugen eines Histogramms mit der Frequenz gegen den Zwischenzeichenabstand (S c ) fur jedes 
Paarvon benachbarten Zeichen innerhalb des Bereichs, 
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(glii) Verwendung einerVielzahl von verschiedenen Gauss Kurven (46), Bestimmen ,welche Kurve ambesten 
passt, wobei die am besten passende Kurve eine beste Anpassung an eine vorbestimmten Art des Histo- 
gramms, das im Schritt (g1i) erzeugt wurde, bildet, und 

(gliii) Bestimmung eines Schatzpunktes auf der Zeichenzwischenraumachse des Histogramms, bei der die 
am besten passende Kurve vorbestimmte Kennzeichen erfullt. 

. Das Verfahren nach Anspruch 6, worin der Schatzpunkt dem Wert (s w ) min =[i k =+3a k entspricht, wobei: 

m+k 

Mk = 1/(2k+1). 2h(i) 
i=m-k 

und 

m+k 

<r k 2 = 1/(2k+1). Z{h(i)- Mk } 2 . 
i=m-k 

. Das Verfahren nach einem der Anspruche 4 bis 7, worin Schritt (g) weiter umfasst: 

(g3) Bestimmung des Zeilenabstandes (s,) zwischen benachbarten Zeilen des Textabschnittes, und 

(g4) Bestimmung der Gr6f3e des Auswahlblocks (20) in Richtung senkrecht zum Fluss des Textbereichs als 
Funktion des Zeilenabstandes (Sj) und/oder worin Schritt (g) weiter umfasst: 

(g5) Bestimmung der horizontalen (Spalten) Grenzen des Textabschnitts, und 

(g6) Bestimmen, ob die erste (8) und die zweite (8") Position auf unterschiedlichen Zeilen des Textabschnitts 
liegt. 

. Das Verfahren nach Anspruch 8, worin Schritt (g2) weiter umfasst, falls die Bestimmung in Schritt (g6) positiv ist: 

(g2i) Uberlagern an einem oberen Abschnitt des Auswahlblocks eines Textabschnittes zwischen der ersten 
Position und der horizontalen (Spalten) Grenze auf der rechten Seite des Textabschnitts, und 

(g2ii) Uberlagern an einem unteren Abschnitt des Auswahlblocks eines Textabschnittes zwischen der zweiten 
Position und der horizontalen (Spalten) Grenze auf der linken Seite des Textabschnitts, und/oder worin der 
Schritt (g2) weiter umfasst: 

(g2iii) wo die Zeile des Textes, die die erste Position enthalt, durch eine oder mehrere weitere Zeilen von der 
Zeile mit Text, die die zweite Position enthalt, getrennt ist, werden fur einen inneren Abschnitt des Auswahl- 
blocks (20) die eine oder die mehreren Zeilen uberlagert, wobei die linke bzw. die rechte horizontale (Spalte) 
Grenze des Textabschnitts als linke und rechte Seite des inneren Textabschnittes verwendet wird. 

0. Ein Computer Programm Produkt, das direkt in den internen Speicher eines Digitalcomputers geladen werden 
kann, umfassend Software Code Abschnitte, urn die Schritte des Anspruchs 1 auszufuhren, wenn das Produkt 
auf einer interaktiven Oberseite-nach-oben Dokumentenabtastvorrichtung zur Ubertragung in einer Videokonfe- 
renz oder zur an-Ort-und-Stelle Anzeige lauft. 

1. Eine interaktive Oberseite-nach-oben Dokumentenabtastvorrichtung zur Ubertragung in einer Videokonferenz 
oder zur an-Ort-und-Stelle Anzeige, inwelcher Bildervon Dokumenten von einer Bilderfassungsvorrichtung erf asst 
werden, wobei die Vorrichtung einen Prozessor, einen Speicher, eine Bilderfassungsvorrichtung, eine Bildanzei- 
gevorrichtung, eine Benutzereingabevorrichtung und ein Computerprogramm gemaf3 Anspruch 10, das in den 
Speicher geladen wird, einschlieRt, wobei der Prozessor mit dem Speicher, der Bilderfassungsvorrichtung, der 
Bildanzeigevorrichtung und der Benutzereingabevorrichtung verbunden ist und das Computerprogramm ausfuhrt. 
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Revendications 

1 . Procede de selection de texte dans un systeme d'analyse par balayage de document a recto oriente vers le haut 
interactif pour transmission lors d'une videoconference ou pour affichage in s/fucomprenant : 

5 

(a) la capture successive d'images par un dispositif de capture d'image, 

(b) I'affichage d'images successives capturees par le dispositif de capture d'image, chaque image etant definie 
par des donnees d'echelle de gris et/ou des donnees d'image couleur et contenant le texte objet, 

w 

(c) la reception d'une premiere entree par I'utilisateur (S1 ) definissant le debut d'une selection et une premiere 
position (8) au sein de I'image affichee (6), 

(d) en reponse a la premiere entree par I'utilisateur (S1), le gel (S2) de I'image affichee, 

15 

(e) la determination de Tangle d'inclinaison (G) du texte objet par rapport au champ de vue (6) du dispositif de 
capture d'image, 

(f) la reception d'au moins une entree supplemental par I'utilisateur, incluant une entree finale par I'utilisateur 
20 definissant la fin d'une selection, et 

(g) pour I'une ou chacune des entrees supplementaires par I'utilisateur, la determination, en utilisant Tangle 
d'inclinaison G determine a Tetape (e), de la position, de la forme et des dimensions d'un element de selection 
en fonction d'au moins ladite premiere position (8) ; et 

25 

(h) I'affichage de Telement de selection (S6) superpose sur ladite image affichee gelee. 

2. Procede selon la revendication 1 , dans lequel Tetape (e) comprend la determination de Tangle d'inclinaison d'une 
premiere partie du texte objet a, ou en contiguite avec, ladite premiere position. 

30 

3. Procede selon la revendication 1 ou 2, dans lequel ladite entree finale par I'utilisateur definit une seconde position 
(8") au sein de I'image affichee, et ledit element de selection comprend un rectangle (10) comprenant deux coins 
opposes (a, b, c, d) coincidant avec lesdites premiere (8) et seconde (8") positions. 

35 4. Procede selon la revendication 1 ou 2, dans lequel la selection comprend la selection d'un ou de plusieurs mots 
dudit texte objet au sein de ladite image affichee et ledit element de selection comprend un pave de selection 
recouvrant ledit un ou lesdits plusieurs mots. 

Procede selon la revendication 4, dans lequel Tetape (g) comprend les sous-etapes : 

(g1) de determination d'une valeur de separation de mot (s w ) min a partir des valeurs mesurees de separation 
entre des paires adjacentes de caracteres dans ledit texte objet, et 

(g2) de determination des dimensions du pave de selection (20) dans la direction de deroulement dudit texte 
45 objet comme une fonction de la valeur de separation des mots (sj^ determined a Tetape (g1). 

6. Procede selon la revendication 5, dans lequel Tetape (g1) comprend les sous-etapes : 

(g1 i) d'utilisation d'une partie dudit texte objet, de preference a, ou en contiguite avec, ladite premiere position 
50 (8), formant un histogramme de la frequence en fonction de Tespacement inter-caracteres (S c ) pour chaque 

paire de caracteres adjacents au sein de ladite partie, 

(glii) en utilisant une pluralite de courbes Gaussiennes (46) differentes, de determination de quelle courbe 
est une courbe a meilleure transposition, ladite courbe a meilleure transposition formant une meilleure trans- 
55 position avec un mode predetermine de Thistogramme forme a Tetape (g1 i), et 

(g1 iii) de determination d'un point estime sur Taxe d'espacement inter-caracteres de Thistogramme auquel la 
courbe a meilleure transposition satisfait un critere predetermine. 
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. Procede selon la revendication 6, dans lequel le point estime correspond a la valeur (s w ) min = fi k = + 3 o k , ou 

p k = l/(2k + 1) . S h(i) 



et 

m+Jc 

a k a = l/(2k + 1) . S <h(i) - ]l k ) 2 



. Precede selon Tune quelconque des revendications 4 a 7, dans lequel I'etape (g) comprend, en outre : 

(g3) la determination de I'espacement de ligne (s^ entre des lignes adjacentes dudit texte objet, et 

(g4) de determination des dimensions du pave de selection (20) dans la direction perpendiculaire au derou- 
lement dudit texte objet comme une fonction dudit espacement de ligne (s,) et/ou dans lequel I'etape (g) com- 
prend, en outre : 

(g5) la determination des limites horizontales (colonne) du texte objet, et 

(g6) de determination si lesdites premiere (8) et seconde (8") positions se trouvent sur des lignes differentes 
dudit texte objet. 

. Procede selon la revendication 8, dans lequel, si la determination a I'etape (g6) est positive, I'etape (g2) comprend, 
en outre : 

(g2i) pour une partie superieure du pave de selection, le recouvrement du texte objet entre ladite premiere 
position et la limite horizontale (colonne) droite du texte objet, et 

(g2ii) pour une partie inferieure du pave de selection, le recouvrement du texte objet entre ladite seconde 
position et la limite horizontale (colonne) gauche du texte objet ; et/ou dans lequel I'etape (g2) comprend, en 
outre : 

(g2iii) ou la ligne du texte comprenant ladite premiere position est separee d'une ou de plusieurs lignes sup- 
plemental de la ligne de texte contenant ladite seconde position, pour une partie interne dudit pave de 
selection (20) le recouvrement de ladite une ou plusieurs lignes supplementaires en utilisant comme cotes 
gauche et droit de ladite partie interne les limites horizontales (colonne), gauche et droite, respectivement, 
dudit texte objet. 

0. Produit de programme informatique pouvant etre directement charge dans la memoire interne d'un ordinateur 
numerique, comprenant des parties de code logicielles pour effectuer les etapes selon la revendication 1 , lorsque 
ledit produit est execute sur un systeme d'analyse par balayage de document oriente cote recto vers le haut 
interactif pour transmission lors d'une videoconference ou pour affichage in situ. 

1. Systeme d'analyse par balayage d'un document cote recto oriente vers le haut interactif pour transmission lors 
d'une videoconference ou pour affichage in situ, dans lequel des images de document sont capturees par un 
dispositif de capture d'image, le systeme incluant un processeur et une memoire, un dispositif de capture d'image, 
un dispositif d'affichage d'image, un dispositif d'entree par I'utilisateur et un programme informatique selon la re- 
vendication 10 etant charge dans la memoire, le processeur etant couple a la memoire, au dispositif de capture 
d'image, au dispositif d'affichage d'image et au dispositif d'entree par I'utilisateur executant le programme infor- 
matique. 
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CALCULATE [l k ,a 2 k 
FROM EQUATIONS (la), (lb) 
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FIG. 7 OA 

FREQUENCY 




DISTANCE BETWEEN 
NEIGHBOURING BLOBS 



FREQUENCY 




DISTANCE BETWEEN NEIGHBOURS 



FIG. 7 OB 
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FIG. 70C 
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FIG. 7 7 
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User clicks to start 
selection 



Calculate skew angle 
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Determine column 
boundaries 



Segment a small vertical 
portion of the column 
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